ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species

Abstract A-to-I editing is the most prevalent RNA editing event, which refers to the change of adenosine (A) bases to inosine (I) bases in double-stranded RNAs. Several studies have revealed that A-to-I editing can regulate cellular processes and is associated with various human diseases. Therefore, accurate identification of A-to-I editing sites is crucial for understanding RNA-level (i.e. transcriptional) modifications and their potential roles in molecular functions. To date, various computational approaches for A-to-I editing site identification have been developed; however, their performance is still unsatisfactory and needs further improvement. In this study, we developed a novel stacked-ensemble learning model, ATTIC (A-To-I ediTing predICtor), to accurately identify A-to-I editing sites across three species, including Homo sapiens, Mus musculus and Drosophila melanogaster. We first comprehensively evaluated 37 RNA sequence-derived features combined with 14 popular machine learning algorithms. Then, we selected the optimal base models to build a series of stacked ensemble models. The final ATTIC framework was developed based on the optimal models improved by the feature selection strategy for specific species. Extensive cross-validation and independent tests illustrate that ATTIC outperforms state-of-the-art tools for predicting A-to-I editing sites. We also developed a web server for ATTIC, which is publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/ATTIC/. We anticipate that ATTIC can be utilized as a useful tool to accelerate the identification of A-to-I RNA editing events and help characterize their roles in post-transcriptional regulation.


INTRODUCTION
RNA editing is a post-transcriptional process that alters the genomic template by adding, deleting or replacing nucleotides, thereby increasing the complexity of RNAs to regulate gene expression [1].In humans, the most prevalent RNA editing event is A-to-I RNA editing, i.e. the conversion of adenosine (A) to inosine (I) occurring in both coding and noncoding transcripts, which is medicated by ADARs (adenosine deaminases that act on RNAs) [2].Throughout the entire catalytic process, ADARs bind double-stranded RNA structures and subsequently delaminate A to I, which is recognised as guanosine (G) by the cellular machinery [3,4].A-to-I RNA editing has been demonstrated to impact numerous critical genetic processes, including alternative splicing regulation, RNAi activity modification, RNA localisation, interference with microRNA function, circular RNA biogenesis and nuclear retention [3,[5][6][7].Besides, A-to-I RNA editing has been linked to several human diseases, such as neurological, cancer, cardiovascular and carcinogenic diseases [5,[8][9][10][11][12].Therefore, understanding A-to-I editing is critical for investigating molecular functions and developing targeted drugs.To date, tremendous effort has been made to identify A-to-I editing sites leveraging various techniques experimentally.High-throughput RNA sequencing is a powerful tool for detecting RNA editing events.It involves sequencing RNA samples and then comparing the resulting sequences to a reference genome.Any differences between the RNA samples and the reference genome can be identified and analysed to determine whether they represent RNA editing events.One major advantage of this approach is that it is relatively simple and straightforward.High-throughput RNA sequencing data can be easily obtained using standard laboratory techniques, and the resulting data can be analysed using a variety of software tools.Numerous A-to-I RNA editing sites in Homo sapiens and Mus musculus have been effectively characterised and curated in several public databases, such as DARNED, RADAR and REDIportal [4,[13][14][15][16][17].However, there are also limitations to using high-throughput RNA sequencing for RNA editing detection.A major limitation is that this approach requires a high-quality reference genome-if a reference genome is not available or incomplete, it may be difficult or impossible to accurately identify RNA editing events.Additionally, highthroughput RNA sequencing data can be complex and difficult to analyse, particularly for large datasets [18][19][20].
De novo prediction tools offer an alternative approach to identifying RNA editing events.These tools are computational methods, such as machine learning (ML)-based algorithms to predict potential RNA editing sites based on sequence features and other factors.An advantage of de novo prediction tools is that they can be used even in the absence of a reference genome.In addition, these tools can be highly specific and can identify novel RNA editing events that may be missed by high-throughput RNA sequencing.However, de novo prediction tools also have some limitations.
They typically rely on complex computational algorithms, and the accuracy of the predictions can be affected by a variety of factors, including the quality of the input data and the choice of parameters used by the algorithms.Therefore, it is desirable to develop appropriate algorithms to ensure the accuracy and reliability of these approaches.
To date, various ML-based predictors have been proposed for the computational identification of A-to-I RNA editing sites.These approaches have been summarised in Table 1, where we highlighted the corresponding ML algorithms, feature encoding schemes, performance evaluation metrics and strategy, web server/software availability and related species.Based on the support vector machine (SVM), Chen et al. proposed PAI to identify A-to-I RNA editing sites in Drosophila melanogaster by combining pseudo dinucleotide composition (PseDNC) with physicochemical characteristics in 2016 [21].The following year, they shifted the research focus to H. sapiens and developed iRNA-AI by integrating chemical properties of nucleotides and nucleotide density (ND) to predict the A-to-I RNA editing [22].Later, iRNA-3typeA was implemented to detect three types of RNA editing events in H. sapiens and M. musculus, including m 1 A, m 6 A and A-to-I RNA editing [23].In 2019, Ahmad et al. designed EPAI-NC to discover Ato-I sites in D. melanogaster by employing both kmer compositions (Kmer) and k-spaced nucleic acid pair composition (CKSNAP) [24].Subsequently, other algorithms started to emerge.known as iMRM, which can simultaneously predict four separate RNA editing events in H. sapiens, M. musculus and Saccharomyces cerevisiae.However, iMRM only made identifying A-to-I RNA editing in H. sapiens possible.In contrast to previous work, iMRM was developed based on the eXtreme Gradient Boosting (XGBoost) algorithm combined with five different types of features and employed a feature selection method to enhance the stability and performance of the predictor [26].In addition, RDDpred [27], RED-ML [28], DeepRed [29], TAE-ML [30], RDDSVM [31], and EditPredict [32] have been developed based on sequencing features and ML approaches.
Despite the computational advances in A-to-I editing prediction, these approaches have several limitations.First, an accurate, multi-species platform for predicting A-to-I RNA editing sites is lacking.Most of the existing methods can only predict A-to-I sites in D. melanogaster (e.g.PAI, EPAI-NC and PRESa2i) and cannot perform accurate predictions for other species.A-to-I editing is a complex biological process that can be inf luenced by various factors, including species-specific factors such as the expression level of ADAR enzymes and their isoforms and RNA editing site accessibility and regulation.The A-to-I editing level and patterns can vary across different species [4,[33][34][35].Second, most existing tools rely on a single classifier or a few ML algorithms, which limits the prediction performance.For instance, PAI, iRNA-Ai and iRNA-3typeA all employed SVM to construct their models and only achieved relatively poor performance.Last but not least, the existing methods lacked a comprehensive profiling of A-to-I RNA modification sites and an adequate interpretation of their prediction results, such as iMRM and iRNA-3typeA.
To address these limitations, herein we proposed a novel stacking ensemble learning predictor, ATTIC (A-To-I ediTing predICtor), to accurately identify A-to-I RNA editing sites from RNA sequences in multiple species, including H. sapiens, M. musculus and D. melanogaster, based on our elaborate analysis of 37 different types of RNA sequence features and evaluation of 14 widely applied ML algorithms.We then constructed a series of stacked ensemble learning models based on selecting the most informative RNA sequence features and the most effective ML algorithms for specific species.Leave-one-out cross-validation (LOOCV) test results demonstrated that ATTIC significantly improves the prediction performance across all three species compared with stateof-the-art predictors.Furthermore, the Shapley Additive exPlanation algorithm (SHAP) was employed to interpret and highlight the most important features used in ATTIC, improving the model interpretability of ATTIC.

The overall framework of ATTIC
Figure 1 shows the workf low of designing and evaluating ATTIC, including five major steps: data collection, feature engineering, model training, model evaluation and model deployment.In the first step, we retrieved and collected the A-to-I editing sites for three species, including H. sapiens, M. musculus and D. melanogaster.In the second step, we extracted 37 different types of RNA sequence features and employed 14 widely used ML algorithms to select the optimal feature encoding schemes for each species.In the third step, we built a set of ensemble models by combining various base classifiers with the stacking method and optimized the model with different feature selection strategies.In the fourth step, we comprehensively assessed the improved stacked models via LOOCV and independent tests and benchmarked ATTIC against existing state-of-the-art A-to-I editing site predictors.Finally, a web server and a stand-alone tool of ATTIC were implemented in order to provide convenient and high-throughput predictions of A-to-I editing sites.

Data collection
In this study, we integrated the datasets from multiple previous studies.Specifically, PAI [21], EPAI-NC [24], PRESa2i [25], PAI-SAE [36] and iAI-DSAE [37] were all trained and assessed using the D.  [38].All these predictors detected A-to-I RNA editing by incorporating computational prediction and experimental validation.In addition, several tools have been developed to predict A-to-I RNA editing sites for H. sapiens, such as iRNA-Ai [22], iMRM [26] and MultiRM [39], whose datasets were derived from DARNED [13].Likewise, originating from DARNED, iRNA-3typeA is the only tool available for identifying A-to-I RNA editing sites in M. musculus [23].Relying on the aforementioned database and sequencing data, Chen et al. [21][22][23] constructed all three datasets successively by applying the CD-HIT program [40] to remove redundant sequences with the 60% sequence similarity threshold.Therefore, we constructed our benchmark datasets for H. sapiens (denoted as H), M. musculus (denoted as M) and D. melanogaster (denoted as D) from these previous studies.More specifically, H was collected from 'iMRM', M from 'iRNA-3typeA' and D from 'PAI' [21,23,26], respectively.As shown in Table 2, H_3000, M_831 and D_125 are three benchmark datasets with 3000, 831 and 125 positive samples, respectively.Since small datasets might experience overfitting issues during training, we also further utilized an independent test dataset D_300 to prove the model's prediction ability for D. melanogaster.
Based on Yu and colleagues' work, Chen et al. constructed D_300 which contains 300 positive samples [21,41].All positive sequence samples are 51 nt sequences with an A at the centre in datasets H_3000, D_125 and D_300, while the positive samples are 41 nt sequences with an A in the centre in dataset M_831.It should be noted that all these datasets contain the same number of negative samples as the positives.The datasets used in this study are provided in Supplementary Data available online at http://bib.oxfordjournals.org/.

Feature engineering
The feature representations have a significant impact on the performance of ML models.In order to discover the effective feature encoding schemes, we extracted 37 RNA sequence encoding methods using our iFeatureOmega package [42].RNA sequences are composed of four types of nucleic acids (A, C, U and G).Therefore, there are 16 dinucleotide combinations theoretically denoted as AA, AC, AG, AU, CA, . . ., and UU.After the rough assessment of the 37 encoding methods using 10-fold CV, we selected six to eight encodings according to our preliminary results for each species based on different feature groups, including nucleic acid composition group, pseudo nucleic acid composition group, residue composition group and physicochemical property group.Through feature engineering, RNA sequences were converted to a fixed-length numerical feature vector.For H. sapiens, we used features of the enhanced nucleic acid composition (ENAC) and NCP from the nucleic acid composition group, position specific of two nucleotides (PS2), and binary from the residue composition group.For M. musculus, we also applied features of NCP and ENAC from the nucleic acid composition group, binary and dinucleotide binary encoding (DBE) from the residue composition group as well as DPCP type2 (DPCP2) from the physicochemical property group.Moreover, five features were utilized for D. melanogaster, including CKSNAP, Kmer and adaptive skip dinucleotide composition (ASDC) from the nucleic acid composition group, PseDNC from the pseudo nucleic acid composition and DPCP from the physicochemical property group.All feature encoding methods used in this study from different groups are given in the following section, and encodings used for comparison refer to Supplementary Data available online at http://bib.oxfordjournals.org/.

Group 1: nucleic acid composition features kmer composition
Kmer denotes the frequency of occurrence of k neighbouring nucleic acids in an RNA sequence [43,44].K can be denoted with different k values (k = 1, 2, 3, 4 . . .).In particular, k was set to 2 (di-), 3 (tri-), 4 (tetra-) and 5 (penta-nucleotide composition) in this study.Taking k = 4 as an example, kmer can be calculated as follows: where N(t) is the number of kmer type t, and N is the length of the nucleotide sequence.

Composition of k-spaced nucleic acid pairs
When using k nucleic acids (k = 0, 1, . . ., 5) separate nucleic acid pairs, CKSNAP refers to the frequency of those separated nucleic acid pairs [45].Taking k = 2 as an example, there are 16 pairs of nucleic acids that are separated by zero, one and two separately.Then, the feature vector of CKSNAP can be calculated as follows: In formula (2), the value of each descriptor indicates the composition of the nucleic acid pair matching it in the nucleotide sequence.For example, if the nucleic acid pair AA occurs m times in the nucleotide sequence, its composition is m AA divided by the total number of 2-spaced nucleic acid pairs in the nucleotide sequence (m Total ).For different k, the values of the total number are P − k − 1, where P is the length of a nucleotide sequence.In this study, k was set to 2, and we finally obtained a 48 (16×3)dimensional feature vector.

Adaptive skip dinucleotide composition
ASDC considers correlation information between neighbouring nucleotides and between intervening nucleotides [46].For a given RNA sequence, the feature vector of ASDC is explained as follows: ( 3 ) In formula (3), f vi is defined as: where L denotes the sequence length and O g i represents the occurrence number of the i-th (i = 1, 2, . . ., 16) g-gap (g = 1, 2, . . ., L − 1) dinucleotide.Hence, f νi denotes the occurrence frequency of all possible dinucleotides (i.e.AA, AC, AG, . . ., UU).

Enhanced nucleic acid composition
ENAC calculates the NAC based on a fixed-length sequence window that slides continuously from the 5 to 3 terminal of the nucleotide sequence and can be used to encode nucleotide sequences of equal length.Therefore, the composition of the enhanced nucleic acid can be calculated as In this formula, k signifies the sliding window's size, N t,p denotes the number of nucleotides A in the sliding window p, t [A, C, G, U], and p = 1, 2, . . ., L − k + 1 denotes the sliding window's position.For the proposed ATTIC method, the sliding window size k was set to 2, with a 200 (= 4 × 50)-dimensional feature vector.

Nucleotide chemical property
Each nucleotide in the RNA sequence has a unique chemical structure and chemical structure-binding properties [47].These chemical characteristics of different types of nucleotides can be categorized into three different groups (Table 3).
To encode these chemical qualities in RNA, three coordinates (x, y, z) were created to represent three chemical groups and give them values of 1 or 0. Thus, the sequence's nucleotide s i = (x i , y i , z i ) may be represented as follows: where the coordinate value of each nucleotide is determined by the nucleotide's chemical properties: A was represented by coordinates (1, 1, 1), C was represented by coordinates (0, 1, 0), G was represented by coordinates (1, 0, 0) and U was represented by coordinates (0, 0, 1).

Dinucleotide physicochemical properties
There are a total of 16 dinucleotide combinations theoretically.
Each of these has different physicochemical properties for the formulation of pseudo dinucleotide composition, which are defined as where f is the frequency of the dinucleotide i, and dpcp(i) is one of the i-th dinucleotide's physicochemical qualities.In this work, DPCP was represented as a 96-dimensional vector (16 dinucleotides×6 physicochemical characteristics), and the specific physicochemical properties include 'rise', 'roll', 'shift', 'slide', 'tilt' and 'twist'.

DPCP type2
Different from DPCP, DPCP2 can be encoded as a matrix as follows: where PC j (N L−1 N L )represents the j-th physicochemical dinucleotide value of the dinucleotide N L−1 N L , and L is the RNA sequence's length [42].ATTIC utilized the physicochemical properties of the 'rise', 'roll', 'shift' and 'slide'.

Pseudo nucleic acid composition
In light of the successful use of pseudo amino acid composition (PseAAC) for predicting protein and peptide sequences, the researchers expanded its key concept to pseudo nucleic acid composition (PseNAC) and successfully applied it to the prediction of RNA and DNA sequences [48].Chen et al. [49] proposed the PseDNC coding scheme, which is based on the PseAAC algorithm and incorporates contiguous local and global sequenceorder information into the nucleotide sequence's feature vector [37].The PseDNC encoding is defined: where d k is defined as In formula (10), ) is the normalized occurrence frequency of dinucleotide in the nucleotide sequence, λ represents the highest counted rank (or tie) of the correlation along the nucleotide sequence, w is a weight factor ranging from 0 to 1, and θ j (j = 1, 2, . . ., λ) is the j-tier correlation factor and is defined as follows: where the correlation function is defined: In formula (12), μ is the number of physicochemical indices.Six indices ('rise', 'roll', 'shift', 'slide', 'tilt' and 'twist') were set as the default indices for RNA.P u R i R i+1 denotes the number of the u-th (u = 1, 2, . . ., μ) physicochemical index of the dinucleotide R i R i+1 at position i, while P u R j R j+1 represents the equivalent value of the dinucleotide R j R j+1 at position j.In this study, the value of λ was set to 2.

Stacking ensemble learning
Stacking ensemble learning methods have been widely used to improve the generalization performance of single algorithms in bioinformatics tasks [55][56][57][58][59].The specific step of the stacking strategy is to select a collection of base classifiers and then use their outputs as the input for a meta-classifier model.In this study, A-to-I RNA editing prediction is a binary classification task, i.e. the model was designed to predict whether a given RNA sequence contains A-to-I RNA editing sites.To determine the base classifiers for each species, we first comprehensively measured the prediction performance of 14 popular ML algorithms for each of the RNA encoding schemes aforementioned, including Catboost [60], Random Forest (RF), Extra Trees (ET) [61], Gradient Boosting Decision Tree [62], SVM, Ridge, Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Logistic Regression (LR), Adaptive Boosting (AdaBoost), DT, K-nearest neighbours (KNN), Quadratic Discriminant Analysis and Light Gradient Boosting Machine (LightGBM) [63].By building these classifiers based on the Scikit-learn package in Python [64] with ten times 10-fold CV, we then selected six base classifiers that have good prediction performance in all three species.A brief introduction of these six base classifiers is presented in Supplementary Data available online at http://bib.oxfordjournals.org/.

Feature selection
The high-dimensional feature sets probably contain irrelevant or redundant information, which could inf luence the performance and efficiency of the model training [65].Therefore, it is essential to identify informative features to enhance the model performance [66,67].In this study, we employed two amended incremental feature selection (IFS) strategies to improve the predictive performance of trained models and selected the optimal design for each species-specific model.The two critical factors for IFS strategies are the feature ranking method and the base classifier, as IFS ranks and selects the features according to the classification performance [68].The detailed procedures of these two IFS strategies are provided in Algorithms 1 and 2, respectively.For each species, we randomly divided the benchmark dataset into two subsets, one for training and the other for performance evaluation.In Algorithm 1, C n is the classifier set, and F m is the original feature set.For each classifier c i in the C n , the features in the F m were sorted in descending order according to the feature importance, and the sorted feature set was denoted as F m = f 1 , f 2 , . . ., f m (step 2).Then, we selected the top k features from F m to conduct 10 times 10-fold CV and recorded the average MCC and area under the receiver operating characteristic curve (AUC) values (steps 3-6).Subsequently, we determined the optimal feature subset for c i by the maximum MCC and AUC value (step 7).If more than one subset achieved the best MCC, we determined the optimal one by comparing their AUC values.We then retrained the base classifier c i by using the optimal feature F i at step 8. C n at step 10 is the base classifier set optimized after IFS, and F is the union optimal feature set.Finally, we ensembled the base classifiers c i to build the stacked model stacked1 and return it at steps 12-13.As for Algorithm 2, the original feature set F m is ranked based on the feature importance generated by the best classifier in F m .Similarly, in steps 2-6, we selected the top k features of F m as f k and trained a stacked ensemble model stackedModel based on C n .Then, we evaluated the performance of stackedModel by 10 times 10-fold CV and record the average MCC and AUC values.Subsequently, we used the optimalFeatureSet () function to get the optimal feature subset F and train the stacked model stacked2 at steps 7-9.We used these two algorithms to train two different stacked ensemble models for each species, and the model with better predictive performance was used as the optimal model.for k (1, m) do: 4: End for 7:

Performance evaluation
In this study, we evaluated the model performance of ATTIC using three cross-validation strategies, including (i) the 10-fold crossvalidation (CV) test, (ii) the leave-one-out (LOO) test and (iii) the independent test.The 10-fold CV test was used to select the optimal features and base classifiers.Then, we conducted LOO and independent tests to evaluate and benchmark the predictive performance of ATTIC with existing methods.As for performance evaluation metrics, we employed Recall, Specificity (Sp), Precision = TP TP + FP ( 17) where true positive (TP), true negative (TN), false positive (FP) and false negative (FN) denote the numbers of correctly identified positive samples, correctly identified negative samples, incorrectly identified positive samples and incorrectly identified negative samples, respectively.We also calculated the AUC in addition to these metrics.

Sequence pattern analysis
To better understand the sequence patterns of the RNA A-to-I editing sites, we analysed the patterns of the sequences in our dataset using Two sample logo [69], which is a web-based tool that detects and displays statistically significant differences in position-specific symbol compositions between two groups of sequences [69].Figure 2A-C shows the base preferences for H. sapiens, M. musculus and D. melanogaster, respectively.The top panels represent positive samples, and the bottom panels represent negative samples.It can be concluded that the sequence patterns show different preferences around the adenine (A) sites at the centre.For H. sapiens, at position 27, guanine (G) is the most prevalent base, followed by U and C. In terms of the sequence pattern for M. musculus, U and C are the two bases that frequently occur, while A and G are the most frequent downstream bases.At the same time, the sequence logo of D. melanogaster demonstrates that RNA sequences are enriched in G and C at most nearby positions.

Comparison of different feature encodings and base classifiers
To build a robust model, we comprehensively evaluated 37 RNA encoding schemes based on 10-fold CV results.We employed the ET [61] algorithm to conduct the experiments due to its relatively good predictive performance, extensive usage and rapid training time [42,70,71].Consequently, we selected six to eight optimal feature encodings for each species based on the performance of ET.Based on the optimal feature encoding schemes selected by ET for each species, we then measured the prediction ability of 14 popular ML algorithms with respect to different species.To guarantee both the variety of classifiers and prediction ability for all three species, we ultimately selected three linear-based models, including LR, Ridge and LDA; and three tree-based models, including ET, Catboost and RF, as the base classifiers.Subsequently, we performed parameter optimization (e.g.kmer size and sliding window size) for the selected RNA encoding schemes to determine the optimal parameters for each encoding and built 132 models (22 encodings × 6 classifiers) accordingly.The optimal parameters of each feature are provided in Supplementary  of accuracy.These performance comparison results reveal that tree-based classifiers performed better than linear-based classifiers across three species.Notably, Catboost and ET performed reasonably well in all three datasets; however, Ridge was marginally superior in D. melanogaster.In summary, CKSNAP is an optimal encoding for the H. sapiens dataset, while NCP is optimal for both the M. musculus and D. melanogaster datasets.

Performance comparison of different ensemble strategies
To identify the optimal base classifiers from the six candidates to build the ensemble model, we assessed five different combinations of the base classifiers for each species using the 10-fold CV on the benchmark datasets.Given that stacking is time consuming, we adopted the optimal encoding for each species selected in Comparison of different feature encodings and base classifiers section to build the stacked ensemble models.The design of different ensemble methods follows the general principles: the two classifiers with good prediction accuracy are preferred for stacking, and the meta-model is selected accordingly to determine the strategy that provides the best predictive performance.In addition, considering that linear classifiers are not superior in general, we appropriately added linear classifiers for stacking to observe whether there are improvements in the overall prediction ability.Figure 3A-C and Supplementary Table S8 available online at http://bib.oxfordjournals.org/represent the performance evaluation results of five different stacked models trained on three species-specific datasets.Regarding H. sapiens, despite relatively lower performance in recall and F 1 , the results revealed that Ensemble3 secured higher prediction performance in terms of ACC, MCC, Precision and AUC.Consequently, we selected Ensem-ble3 for H. sapiens, which used ET and RF as the base classifiers and employed ET as the meta-classifier.In terms of M. musculus, it should be noted that the model performance did not improve as the number of base classifiers increased.Ensemble1 and Ensemble2 achieved higher prediction performance (an accuracy of 0.9914) than the other three ensemble models.As shown in Figure 3B, Ensemble1 was superior to Ensemble2 due to its better Precision and AUC.Besides, we found that incorporating the linear model does not increase the prediction performance; therefore, we employed CatBoost and ET as the base classifiers and ET as the meta-classifier (Ensemble1) to build the stacked model.Owing to the smaller training dataset of D. melanogaster compared to the other two species, we employed more base classifiers to build stacked ensemble models to enhance the prediction accuracy and attempted to balance the number of treebased and linear-based base classifiers.To serve these purposes, Ensemble5 is more suitable than Ensemble1.Besides, Figure 3C suggests that Ensemble5 secured the best predictive performance in terms of ACC (0.7882), MCC (0.5862), Precision (0.7570) and AUC (0.8236), together with a lower variance compared to Ensemble1.In summary, we applied Ensemble3 for H. sapiens, Ensemble1 for M. musculus and Ensemble5 for D. melanogaster in the following experiments.

Performance comparison of different feature combinations
In this section, we optimised the stacked ensemble models selected in the Performance comparison of different ensemble strategies section for each species using a wide range of feature encoding combinations.Figure 3D-F and Supplementary Table S9 available online at http://bib.oxfordjournals.org/show the 10-fold CV test results of different encoding combinations on the benchmark datasets.To ensure an objective comparison, we performed all the 10-fold CV tests based on the same partition of the training datasets.As diverse features are highly likely to provide more valuable information to enhance the effectiveness of models, we removed the features that are obviously inferior to their counterparts shown in the Comparison of different feature encodings and base classifiers section for different species and tested the predictive performance by incorporating the features from different groups.For H. sapiens, the new feature combination Combin1 achieved the best performance in all performance metrics except for AUC, with ACC of 0.9124, MCC of 0.8331, Recall of 0.8435, Precision of 0.9788 and F 1 score of 0.9058.For M. musculus, we only examined two encoding combinations and slightly altered the settings of DPCP2, as the results in Figure 2E demonstrate that the performance of these five coding schemes did not vary significantly.As shown in Figure 3E, Combin1 is the superior encoding approach for M. musculus, evidenced by its outstanding predictive performance across all metrics.In terms of D. melanogaster, the ACC, MCC, Recall and F 1 scores were achieved at 0.7000, 0.4015, 0.7611 and 0.7219, respectively, by Combin5, which outperformed the other two combinations.This is consistent with our previous findings in the Comparison of different feature encodings and base classifiers section that CKSNAP, Kmer, ASDC, PseDNC and DPCP are the best-performing encoding schemes.We ultimately selected Combin1 for H. sapiens and M. musculus, and Combin3 for D. melanogaster, respectively.

Performance comparison among different feature selection strategies
We trained the stacked ensemble models for each species according to the optimal ensemble strategies and feature combinations selected in the previous sections.However, the initial feature sets are likely to have some redundant and noisy features, which may have a negative impact on model performance.Therefore, we aimed to conduct feature selection to uncover informative feature subsets to enhance the prediction performance of stacked ensemble models.To this end, we compared two types of twostep feature selection strategies to determine the optimal feature subset for each species-specific model.Figure 4A-C illustrates the hold-out results of two feature selection strategies among three species.The number of selected features by each feature selection strategy and the corresponding prediction performance are provided in Supplementary Table S10 and Supplementary Figure S1 available online at http://bib.oxfordjournals.org/.As shown in Figure 4A, Strategy2 achieved the best prediction performance in terms of ACC, Recall and F 1 score for H. sapiens, although the initial feature set got the best MCC, AUC and Precision.As we aim to improve prediction accuracy and reduce unnecessary features, Strategy2 was selected to optimize the H. sapiens model.As illustrated in Figure 4B, neither of the feature selection strategies enhanced the prediction performance on the M_831 dataset.However, given that the number of features selected by Strategy1 (i.e.163) is smaller than that by Strategy2 (i.e.322), we adopted Strategy1 to accelerate the model construction and avoid overfitting.In comparison, for D. melanogaster, it is worth noting that both feature selection strategies are helpful for improving prediction performance on the D_125 dataset, and Strategy1 outperformed Strategy2 in all assessment metrics except for Recall, with ACC of 0.8180, MCC of 0.6191, Precision of 0.8250, F 1 score of 0.8250 and AUC of 0.8868, respectively.Therefore, Strategy1 was selected for D. melanogaster model optimisation.

ATTIC outperformed state-of-the-art approaches for A-to-I editing site prediction
The final ATTIC framework was developed upon the optimal models improved by the feature selection strategy for specific species.We benchmarked ATTIC with several state-of-the-art approaches for different species: PAI [21], PAI-SAE [36] and iAI-DSAE [37] based on the D_125 dataset for D. melanogaster; iRNA-Ai [22] and iMRM [26] based on the H_3000 dataset for H. sapiens;   and iRNA-3typeA [23] based on the M_831 dataset for M. musculus.
We evaluated the performance of these models using the leaveone-out strategy, which was used in these methods.The ATTIC model of each species was built using the optimized features, base classifiers, feature selection approaches and stacking strategies concluded from our previous sections.Table 4 shows that ATTIC constantly outperformed the benchmarked methods regarding ACC, MCC, Recall and Sp scores regardless of species.Specifically, ATTIC outperformed other methods regarding all performance measures for A-to-I editing site prediction in D. melanogaster.Furthermore, ATTIC achieved satisfactory performance on the M_831 dataset.Except for Sp (0.9952), the other three measures improved significantly compared to iRNA-3typeA with ACC of 0.9946, MCC of 0.9892 and Recall of 0.9940.For all three species, we additionally provided AUC for further explanation.In addition, we assessed ATTIC and the other two methods, including PAI [21] and EPAI-NC [24], on the independent dataset (D_300).Figure 4D and Supplementary Table S11 available online at http://bib.oxfordjournals.org/ show the close prediction accuracy compared to the leaveone-out test results and indicate that ATTIC outperformed EPAI-NC and PAI in predicting A-to-I editing sites for D. melanogaster.Furthermore, to assess the generalization of all three models, we also conducted the independent tests by randomly selecting 30% of the samples as independent data and using the remaining 70% to train the model.The results of the independent test are provided in Supplementary Table S12 available online at http:// bib.oxfordjournals.org/.We can see that the performance results are similar to those on LOOCV, suggesting that ATTIC has a good generalization ability.In conclusion, ATTIC shows superior predictive performance across three species, and the improved prediction performance enables a more accurate prediction in Ato-I RNA editing sites.

Improving the model interpretation of ATTIC
We applied the SHAP algorithm to interpret the feature importance to explain the outstanding prediction performance and robustness of the ATTIC model [72,73].Specifically, we employed the kernel SHAP, which can interpret the complicated stacked structure of ATTIC.The kernel SHAP estimates the SHAP values using a weighted local linear regression model [74].
Figure 5 shows the top 20 features ranked by the mean SHAP values integrated with all the testing samples.We noticed that nine PS2 features (PS2_634, PS2_397

Development of web server
To facilitate the community-wide usage of ATTIC for identifying potential A-to-I RNA editing sites, we implemented an online web server and a local stand-alone software package of the developed ATTIC approach.The web server is publicly available at https:// web.unimelb-bioinfortools.cloud.edu.au/ATTIC/, while the standalone software package can be downloaded from GitHub (https:// github.com/Cassie818/ATTIC).The web server of ATTIC is maintained with an Apache HTTP server configured on a 16-core Ubuntu server equipped with 32GB RAM and 500GB hard disk supported by the Research Cloud of The University of Melbourne.
The web server requires a few steps to predict the potential Ato-I RNA editing sites for a given RNA sequence.First, users are required to input or upload the query RNA sequences in the FASTA format.Next, users need to specify the species of interest and the corresponding threshold.Finally, the job will be submitted to the server by clicking the 'Submit' button.After the job is finished, the generated results will be returned to the webpage or emailed to the user's email address if provided.The results webpage lists the query sequence, predicted probabilities, species types and predicted results (A-to-I editing site or not).In addition, the help webpage of the web server provides detailed guidance for users to understand and interpret the results.

Limitations and future work
Despite the promising performance of ATTIC in predicting A-to-I RNA editing sites across the three different species, it has the following limitations.
The first major limitation is that the training dataset used in ATTIC is relatively limited.To ensure that the dataset used in this study is consistent in distribution and that a stable and robust machine learning model could be trained, we only utilized the data retrieved from the DARNED database.In future work, we plan to develop deep learning-based methods to enable the integration of multiple datasets from different databases, such as RADAR and REDIportal [4,14,17], to explore the possibility of further enhancing the predictive performance.
The second major limitation is that ATTIC is unable to predict tissue-specific A-to-I RNA editing sites.Previous studies have shown that A-to-I RNA editing can significantly vary across cells, tissues and developmental stages, thereby affecting multiple different aspects of RNA metabolism [75].Accordingly, it would be of particular value to develop tissue-specific predictors based on the tissue-specific RNA editing knowledge annotated by RNA editing databases such as RADAR and REDIportal [4,14,17] in future work.

CONCLUSION
In this study, we proposed ATTIC, a novel stacking-based predictor for identifying A-to-I RNA editing sites in H. sapiens, M. musculus and D. melanogaster.We first comprehensively evaluated the prediction performance of 14 widely established ML models with 37 types of RNA sequence features and then built a series of stacked ensemble models for optimisation.We then built the final ATTIC model for each species upon the optimal feature combinations, stacking strategies and feature selection approaches.The LOO and independent test results demonstrate that ATTIC outperformed state-of-the-art tools for predicting A-to-I editing sites in three species.Besides, we employed the SHAP algorithm to analyse the contributions of important features to the superior performance of ATTIC.Finally, we developed a user-friendly web server and software package to facilitate community use of ATTIC.To the best of our knowledge, ATTIC is the first stacking-based predictor that enables the detection of A-to-I RNA sites in H. sapiens, M. musculus and D. melanogaster.The success of ATTIC can be attributed to the following factors: (i) the comprehensive selection of the optimal ensemble strategies and feature combinations for each species; (ii) using feature selection approaches to effectively remove the redundant features.Although ATTIC significantly increased the prediction accuracy of M. musculus and D. melanogaster, the improvement for H. sapiens is limited.Besides, we trained three models independently that lack generalizability to other species.We anticipate that ATTIC can be utilised as a useful tool for predicting A-to-I RNA editing sites and improving our understanding of their roles in transcriptional modification.

Figure 1 .
Figure 1.The overall framework of ATTIC.

Figure 2 .
Figure 2. The analysis of RNA sequences, encoding methods and ML algorithms.Panels (A), (B) and (C) present the sequence logos of RNAs for H. sapiens, M. musculus and D. melanogaster, respectively, with positive samples up and negative ones down.Panels (D), (E) and (F) demonstrate 10-fold CV test results of the ET classifier trained with different encoding schemes on H_3000, M_831 and D_125 datasets, respectively.Panels (G), (H) and (I) show 10-fold CV test results of six base classifiers on H_3000, M_831 and D_125 datasets, respectively.

Figure 3 .
Figure 3. Performance comparison of different combinations of classifiers and encodings.Panels (A)-(C) show the performance comparison results of five ensemble strategies for the three species; panels (D)-(F) provide the corresponding comparisons of diverse encoding combinations.

Figure 4 .
Figure 4.The performance evaluation results.Panels (A), (B) and (C) provide the comparison results of two different feature selection strategies in terms of ACC, MCC, Recall, Precision, F 1 and AUC for H. sapiens, M. musculus and D. melanogaster.(D) Comparison with existing tools on the independent test datasets.

Figure 5 .
Figure 5. Top 20 features of ATTIC ranked by the SHAP algorithm for (A) H. sapiens, (B) M. musculus and (C) D. melanogaster.Each row represents a feature, and each column represents a sample.The colour represents the range of SHAP values, where the redder colour indicates that the SHAP value is greater than zero, and the bluer colour indicates that the SHAP value is below zero.Besides, the SHAP value greater than 0 indicates that the prediction tends to be a positive sample (i.e.A-to-I RNA editing site).In contrast, a SHAP value of less than 0 indicates that the prediction is likely to be a negative sample (i.e.non-A-to-I RNA editing site).

Table 2 .
Detailed information for the benchmark datasets used in ATTIC

Table 3 .
The chemical properties of different types of nucleotides in RNA sequences Table S1 available online at http:// bib.oxfordjournals.org/,and the performance comparison results of these 132 models are provided in Supplementary Tables S2-S7 available online at http://bib.oxfordjournals.org/.Figure 2D-F shows the 10-fold CV test results of ET classifiers trained with different encoding schemes and Figure 2G-I shows the 10-fold CV test results of six base classifiers on H_3000, M_831 and D_125 datasets, respectively.For H. sapiens, nucleic acid compositionbased encodings (NCP) and residue composition-based encodings (i.e.binary, DBE and PS2) can provide better discriminative ability than KNN and ENAC.In addition, these four encodings (NCP, binary, DBE and PS2) show almost the same contributions to the prediction performance.Therefore, we selected these four encodings for H. sapiens models for further analysis.In terms of M. ing sequence-specific properties.Similarly, residue compositionbased features consider only the occurrence frequencies of individual nucleotides in a sequence, failing to consider the ordering or arrangement of these residues in the local sequence context.The aforementioned results demonstrate the value of examining numerous feature information across three benchmark datasets.In addition, the heatmap compares different base classifiers for each species, with colour indicating the level

Table 4 .
Comparison with state-of-the-art methods based on the benchmark dataset The bold values in the table represent the best performance of the corresponding metrics.